Joint Alignment and Artificial Data Generation: An Empirical Study of Pivot-based Machine Transliteration

نویسندگان

  • Min Zhang
  • Xiangyu Duan
  • Ming Liu
  • Yunqing Xia
  • Haizhou Li
چکیده

In this paper, we first carry out an investigation on two existing pivot strategies for statistical machine transliteration, namely system-based and model-based strategies, to figure out the reason why the previous model-based strategy performs much worse than the system-based one. We then propose a joint alignment algorithm to optimize transliteration alignments jointly across source, pivot and target languages to improve the performance of the modelbased strategy. In addition, we further propose a novel synthetic data-based strategy, which artificially generates source-target data using pivot language. Experimental results on benchmarking data show that the proposed joint alignment optimization algorithm significantly improves the accuracy of model-based strategy and the proposed synthetic data-based strategy is very effective for pivot-based machine transliteration.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Transliteration normalization for Information Extraction and Machine Translation

Arabic; Named Entity Recognition; Transliteration; Name normalization; Information Extraction; Machine Translation Abstract Foreign name transliterations typically include multiple spelling variants. These variants cause data sparseness and inconsistency problems, increase the Out-of-Vocabulary (OOV) rate, and present challenges for Machine Translation, Information Extraction and other natural ...

متن کامل

Machine Transliteration: Leveraging on Third Languages

This paper presents two pivot strategies for statistical machine transliteration, namely system-based pivot strategy and model-based pivot strategy. Given two independent source-pivot and pivot-target name pair corpora, the model-based strategy learns a direct sourcetarget transliteration model while the system-based strategy learns a sourcepivot model and a pivot-target model, respectively. Ex...

متن کامل

The Application of Bayesian Alignment Techniques to Transliteration Generation and Mining

Bayesian techniques have recently been applied to many areas of natural language processing, and have proven themselves particularly useful in areas involving segmentation and alignment. This paper looks at the direct application of these techniques to the co-segmentation/alignment of grapheme sequences. We detail a novel Bayesian model for unsupervised bilingual character sequence alignment of...

متن کامل

Bubble Pressure Prediction of Reservoir Fluids using Artificial Neural Network and Support Vector Machine

Bubble point pressure is an important parameter in equilibrium calculations of reservoir fluids and having other applications in reservoir engineering. In this work, an artificial neural network (ANN) and a least square support vector machine (LS-SVM) have been used to predict the bubble point pressure of reservoir fluids. Also, the accuracy of the models have been compared to two-equation stat...

متن کامل

A Bayesian model of bilingual segmentation for transliteration

In this paper we propose a novel Bayesian model for unsupervised bilingual character sequence segmentation of corpora for transliteration. The system is based on a Dirichlet process model trained using Bayesian inference through blocked Gibbs sampling implemented using an efficient forward filtering/backward sampling dynamic programming algorithm. The Bayesian approach is able to overcome the o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011